An Optimal Seed Based Compression Algorithm for DNA Sequences
نویسندگان
چکیده
منابع مشابه
An Optimal Seed Based Compression Algorithm for DNA Sequences
This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only prom...
متن کاملGrammar-based Compression of DNA Sequences
Grammar-based compression algorithms infer context-free grammars to represent the input data. The grammar is then transformed into a symbol stream and finally encoded in binary. We explore the utility of grammar-based compression of DNA sequences. We strive to optimize the three stages of grammar-based compression to work optimally for DNA. DNA is notoriously hard to compress, and ultimately, o...
متن کاملCompression of DNA Sequences
We propose a lossless algorithm to compress the information contained in DNA sequences. None of the available universal algorithms compress such data. This is due to the speciicity of genetic information. Our method is based on regularities, such as the presence of palindromes, in the DNA. The results we obtain, although not satisfactory, are far beyond classical algorithms.
متن کاملFast Discerning Repeats in DNA Sequences with a Compression Algorithm
Long direct repeats in genomes arise from molecular duplication mechanisms like retrotransposition, copy of genes, exon shu ing, . . . Their study in a given sequence reveals its internal repeat structure as well as part of its evolutionary history. Moreover, detailed knowledge about the mechanisms can be gained from a systematic investigation of repeats. The problem of nding such repeats is vi...
متن کاملa fast algorithm for exonic regions prediction in dna sequences
the main purpose of this paper is to introduce afast method for gene prediction in dna sequences based on the period-3 property in exons. first, the symbolic dna sequences are converted to digital signal using the eiip method. then, to reduce the effect of background noise in the period-3 spectrum, we use the discrete wavelet transform (dwt) at three levels and apply it on the input digital sig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in Bioinformatics
سال: 2016
ISSN: 1687-8027,1687-8035
DOI: 10.1155/2016/3528406